The objectives of the exercises of this assignment are:
1) Download and organise the data and model and plot staircase responses based on fits of logistic functions
2) Fit multilevel models for response times
3) Fit multilevel models for count data
REMEMBER: In your report, make sure to include code that can reproduce the answers requested in the exercises below (MAKE A KNITTED VERSION)
REMEMBER: This assignment will be part of your final portfolio
Go to https://osf.io/ecxsj/files/ and download the files associated with Experiment 2 (there should be 29).
The data is associated with Experiment 2 of the article at the following DOI https://doi.org/10.1016/j.concog.2019.03.007
data_exp_2 <- read_bulk("experiment_2", extension = ".csv")
## Reading 001.csv
## Reading 002.csv
## Reading 003.csv
## Reading 004.csv
## Reading 005.csv
## Reading 006.csv
## Reading 007.csv
## Reading 008.csv
## Reading 009.csv
## Reading 010.csv
## Reading 011.csv
## Reading 012.csv
## Reading 013.csv
## Reading 014.csv
## Reading 015.csv
## Reading 016.csv
## Reading 017.csv
## Reading 018.csv
## Reading 019.csv
## Reading 020.csv
## Reading 021.csv
## Reading 022.csv
## Reading 023.csv
## Reading 024.csv
## Reading 025.csv
## Reading 026.csv
## Reading 027.csv
## Reading 028.csv
## Reading 029.csv
head(data_exp_2)
## trial.type pas trial jitter.x jitter.y odd.digit target.contrast
## 1 staircase 4 0 -0.34252959 0.44888443 9 1.0
## 2 staircase 4 1 0.06229222 0.02914578 9 1.0
## 3 staircase 4 2 -0.40575096 0.49951833 7 0.9
## 4 staircase 4 3 -0.36154113 -0.22243588 7 0.9
## 5 staircase 4 4 0.28907575 0.41321450 7 0.8
## 6 staircase 4 5 0.08244807 -0.09336518 7 0.8
## target.frames cue task target.type rt.subj rt.obj even.digit seed
## 1 3 29 pairs odd 1.7535665 1.1578644 8 9817
## 2 3 29 pairs even 1.3902975 2.4563786 4 9817
## 3 3 29 pairs even 0.6862642 0.9509719 4 9817
## 4 3 29 pairs odd 0.6529463 0.6203329 4 9817
## 5 3 29 pairs even 0.5817340 0.9503946 4 9817
## 6 3 29 pairs even 0.5484627 0.8314905 4 9817
## obj.resp subject File
## 1 o 1 001.csv
## 2 e 1 001.csv
## 3 e 1 001.csv
## 4 o 1 001.csv
## 5 e 1 001.csv
## 6 e 1 001.csv
i. add a variable to the data frame and call it _correct_ (have it be a _logical_ variable). Assign a 1 to each row where the subject indicated the correct answer and a 0 to each row where the subject indicated the incorrect answer (__Hint:__ the variable _obj.resp_ indicates whether the subject answered "even", _e_ or "odd", _o_, and the variable _target_type_ indicates what was actually presented.
data_exp_2$correct <- ifelse(grepl("odd", data_exp_2$target.type) & grepl("o", data_exp_2$obj.resp), 1, ifelse(grepl("even", data_exp_2$target.type) & grepl("e", data_exp_2$obj.resp), 1, 0))
#making it a factorial variable instead of numeric
data_exp_2$correct <- as.factor(data_exp_2$correct)
ii. describe what the following variables in the data frame contain, _trial.type_, _pas_, _trial_, _target.contrast_, _cue_, _task_, _target_type_, _rt.subj_, _rt.obj_, _obj.resp_, _subject_ and _correct_. (That means you can ignore the rest of the variables in your description). For each of them, indicate and argue for what `class` they should be classified into, e.g. _factor_, _numeric_ etc.
summary(data_exp_2)
## trial.type pas trial jitter.x
## Length:18131 Min. :1.000 Min. : 0.0 Min. :-0.4998692
## Class :character 1st Qu.:1.000 1st Qu.: 78.0 1st Qu.:-0.2512119
## Mode :character Median :2.000 Median :166.0 Median : 0.0040918
## Mean :2.312 Mean :182.3 Mean :-0.0000059
## 3rd Qu.:3.000 3rd Qu.:275.0 3rd Qu.: 0.2477540
## Max. :4.000 Max. :431.0 Max. : 0.4998842
## jitter.y odd.digit target.contrast target.frames
## Min. :-0.4999492 Min. :3.000 Min. :0.01000 Min. :3
## 1st Qu.:-0.2491801 1st Qu.:5.000 1st Qu.:0.05683 1st Qu.:3
## Median :-0.0002548 Median :7.000 Median :0.06329 Median :3
## Mean :-0.0004723 Mean :6.024 Mean :0.08771 Mean :3
## 3rd Qu.: 0.2501331 3rd Qu.:9.000 3rd Qu.:0.09392 3rd Qu.:3
## Max. : 0.4999779 Max. :9.000 Max. :1.00000 Max. :3
## cue task target.type rt.subj
## Min. : 0.000 Length:18131 Length:18131 Min. : 0.01369
## 1st Qu.: 0.000 Class :character Class :character 1st Qu.: 0.31844
## Median : 5.000 Mode :character Mode :character Median : 0.52811
## Mean : 8.378 Mean : 0.74128
## 3rd Qu.:14.000 3rd Qu.: 0.82874
## Max. :35.000 Max. :32.77161
## rt.obj even.digit seed obj.resp
## Min. : 0.00004 Min. :2.000 Min. : 920 Length:18131
## 1st Qu.: 0.53662 1st Qu.:4.000 1st Qu.:21013 Class :character
## Median : 0.88568 Median :4.000 Median :44495 Mode :character
## Mean : 1.16186 Mean :4.996 Mean :41972
## 3rd Qu.: 1.37276 3rd Qu.:6.000 3rd Qu.:56353
## Max. :291.83119 Max. :8.000 Max. :89278
## subject File correct
## Min. : 1.00 Length:18131 0: 4545
## 1st Qu.: 8.00 Class :character 1:13586
## Median :15.00 Mode :character
## Mean :15.06
## 3rd Qu.:22.00
## Max. :29.00
Description of variables: - trial.type: whether the trial in question is from the experimental trials or “staircase” trial. The staircase trials had the purpose of calibrating the contrast of the target stimulus so it matched the specific individuals threshold. This should be coded as a factor. - pas: Perceptual Awareness Scale that has 4 categories: No Experience (1) (No impression of the stimulus), Weak Glimpse (2) (A feeling that something has been shown), Almost Clear Experience (3) (Ambiguous experience of the stimulus), Clear Experience (4) (Non-ambiguous experience of the stimulus). This is basically a subjective measure of to what degree a person is aware of having perceived something. This should be coded as a factor. - trial: Trial number. This could be coded as both numeric or a factor depending on the intent. - target.contrast: Contrast of the target stimulus compared to the background. This was adjusted to match the threshold of each individual participant (done through something called the QUEST-aglorithm). Shuold be coded as numeric. - cue: Cue provided to the participants indicating the set of possible digits that might appear on each trial. The cue could either be from a set of 2, 4 or 8 stimuli. single digit, pair of digits or quadruplets (even though the data on quadruplets says 0). Should be coded as a factor. - task: Indicating whether the cue was either two single digits, two pairs of digits or two groups of 4 digits (singles, pairs, quadruplet). Should be coded as factor. - target.type: Indicates whether the target presented was a odd or even number (odd/even). - rt.subj: Time used before answering PAS scale. Should be coded as numeric. - rt.obj: Time used before answering whether target was odd or even. Should be coded as numeric. - obj.resp: Indicates whether the participant answered that the target was odd (o) or even (e). - subject: Unique participant number. Should be coded as factor. - correct: indicating whether subject was correct or not. Correct = 1, not correct = 0. Should be coded as factor or logical variable.
data_exp_2$pas <- as.factor(data_exp_2$pas)
data_exp_2$subject <- as.factor(data_exp_2$subject)
iii. for the staircasing part __only__, create a plot for each subject where you plot the estimated function (on the _target.contrast_ range from 0-1) based on the fitted values of a model (use `glm`) that models _correct_ as dependent on _target.contrast_. These plots will be our _no-pooling_ model. Comment on the fits - do we have enough data to plot the logistic functions?
#I was unsure, what you wanted us to do. So I have created 3 different solutions that I assessed to be equally probable of being the solution
#first solution, which is a partial pooling model plotted for each participant (I see this as the best solution)
data_exp_2_stair <- data_exp_2 %>%
filter(trial.type == "staircase")
mod_1_no <- glm(correct ~ target.contrast + subject + target.contrast:subject, data = data_exp_2_stair, family = 'binomial')
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
data_exp_2_stair$fitted <- fitted.values(mod_1_no)
data_exp_2_stair %>%
ggplot(aes(target.contrast, fitted)) +
geom_point() +
facet_wrap(~subject)
#second solution is just complete pooling and then plotting for each individual
mod_1_com <- glm(correct ~ target.contrast, data = data_exp_2_stair, family = 'binomial')
data_exp_2_stair$fitted <- fitted.values(mod_1_com)
data_exp_2_stair %>%
ggplot(aes(target.contrast, fitted)) +
geom_point() +
facet_wrap(~subject)
#third solution, making a function and a plot for each participant trough a for-loop. However, this is a messy solution
mod_mod = list()
plot_plot = list()
for (i in 1:length(unique(data_exp_2$subject))){
data_exp_2_stair <- data_exp_2 %>%
filter(trial.type == "staircase") %>%
filter(subject == i)
mod_1_no_pool <- glm(correct ~ target.contrast, data = data_exp_2_stair, family = 'binomial')
mod_mod[[i]] <- mod_1_no_pool
data_exp_2_stair$fitted <- fitted.values(mod_1_no_pool)
plotty <- data_exp_2_stair %>%
ggplot(aes(target.contrast, fitted)) +
geom_point() +
ylim(0,1) +
ggtitle(paste(c('Fitted values for Subject', unique(data_exp_2$subject)[i]), collapse=', ' ))
plot_plot[[i]] <- plotty
}
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
plot_plot
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
##
## [[7]]
##
## [[8]]
##
## [[9]]
##
## [[10]]
##
## [[11]]
##
## [[12]]
##
## [[13]]
##
## [[14]]
##
## [[15]]
##
## [[16]]
##
## [[17]]
##
## [[18]]
##
## [[19]]
##
## [[20]]
##
## [[21]]
##
## [[22]]
##
## [[23]]
##
## [[24]]
##
## [[25]]
##
## [[26]]
##
## [[27]]
##
## [[28]]
##
## [[29]]
First of all I would like to comment on the proposed solutions. To me it seems like that the first and third solutions are producing the exact same plots. One of them is doing a no-pooling model, the other is doing a complete-pooling model for one subject at a time through a for-loop.
It seems like it varies from participant to participant whether the data is evenly distributed or grouped in fewer intervals. Some subjects have data that are appropriate for visualising logistic functions, while others are missing important data in the middle.
iv. on top of those plots, add the estimated functions (on the _target.contrast_ range from 0-1) for each subject based on partial pooling model (use `glmer` from the package `lme4`) where unique intercepts and slopes for _target.contrast_ are modelled for each _subject_
#I choose to continue with solution 1, since it is easier to manipulate
data_exp_2_stair <- data_exp_2 %>%
filter(trial.type == "staircase")
mod_1_no <- glm(correct ~ target.contrast + subject + target.contrast:subject, data = data_exp_2_stair, family = 'binomial')
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
mod_1_par <- glmer(correct ~ target.contrast + (target.contrast|subject), data = data_exp_2_stair, family = 'binomial')
data_exp_2_stair$fitted_no_pooling <- fitted.values(mod_1_no)
data_exp_2_stair$fitted_par_pooling <- fitted.values(mod_1_par)
data_exp_2_stair <- data_exp_2_stair %>%
pivot_longer(c(fitted_no_pooling, fitted_par_pooling))
data_exp_2_stair %>%
ggplot(aes()) +
geom_point(aes(target.contrast, value, colour = name)) +
facet_wrap(~subject) +
ylab("Fitted Values") +
xlab("Contrast of Target Stimulus") +
ggtitle("Plotting of No Pooling and Partial Pooling models")
v. in your own words, describe how the partial pooling model allows for a better fit for each subject
In general the partial pooling model “flattens” each participant’s function, making it more conservative and probably more generalisable. It “flattens” out some of the idiosyncratic fluctuations for each participant and drawing it closer to the mean.
Now we only look at the experiment trials (trial.type)
data_exp_2_exp <- data_exp_2 %>%
filter(trial.type == "experiment")
#I pick four subjects by random
set.seed(1234)
random_subject <- sample(1:29, 4)
models <- list()
for (i in 1:length(random_subject)){
data_exp_2_exp_sub <- data_exp_2 %>%
filter(trial.type == "experiment") %>%
filter(subject == i)
mod_2_int <- lm(rt.obj~1, data = data_exp_2_exp_sub)
models[[i]] <- mod_2_int
title = paste(c('Normal QQ-plot of Subject', random_subject[i]), collapse=', ' )
qqnorm(resid(mod_2_int), main = title);qqline(resid(mod_2_int), col = 'green')
}
i. comment on these
I randomly chose subject 16, 22, 26 and 28 and evaluated their residuals based on an only intercept model (modeled on only the data from each subject, no pooling). They all look positively skewed (some heavilly skewed)
ii. does a log-transformation of the response time data improve the Q-Q-plots?
models <- list()
for (i in 1:length(random_subject)){
data_exp_2_exp_sub <- data_exp_2 %>%
filter(trial.type == "experiment") %>%
filter(subject == i)
mod_2_int <- lm(log(rt.obj)~1, data = data_exp_2_exp_sub)
models[[i]] <- mod_2_int
title = paste(c('Normal QQ-plot of Subject', random_subject[i]), collapse=', ' )
qqnorm(resid(mod_2_int), main = title);qqline(resid(mod_2_int), col = 'green')
}
Yes, a log-transformation do actually improve the residuals quite substantially. However, subject 16 still look slightly positive skewed and subject 26 now look slightly negatively skewed. However, it is a big improvement compared to the residuals before the log-transformation.
REML=FALSE in your lmer-specification)
First of all, I would model random intercepts for subject. Also random intercept for trial and pas could be argued for. Lastly I would also like to have a random slope for task.
data_exp_2_exp <- data_exp_2 %>%
filter(trial.type == "experiment")
mod_2_par_1 <- lmer(rt.obj ~ task + (1|subject), data = data_exp_2_exp, REML = F)
mod_2_par_2 <- lmer(rt.obj ~ task + (task|subject), data = data_exp_2_exp, REML = F)
## boundary (singular) fit: see ?isSingular
mod_2_par_3 <- lmer(rt.obj ~ task + (1|subject) + (1|pas), data = data_exp_2_exp, REML = F)
mod_2_par_4 <- lmer(rt.obj ~ task + (task|subject) + (1|pas), data = data_exp_2_exp, REML = F)
## boundary (singular) fit: see ?isSingular
mod_2_par_5 <- lmer(rt.obj ~ task + (1|subject) + (1|pas) + (1|trial), data = data_exp_2_exp, REML = F)
anova(mod_2_par_1, mod_2_par_2, mod_2_par_3, mod_2_par_4, mod_2_par_5)
## Data: data_exp_2_exp
## Models:
## mod_2_par_1: rt.obj ~ task + (1 | subject)
## mod_2_par_3: rt.obj ~ task + (1 | subject) + (1 | pas)
## mod_2_par_5: rt.obj ~ task + (1 | subject) + (1 | pas) + (1 | trial)
## mod_2_par_2: rt.obj ~ task + (task | subject)
## mod_2_par_4: rt.obj ~ task + (task | subject) + (1 | pas)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## mod_2_par_1 5 61940 61977 -30965 61930
## mod_2_par_3 6 61917 61962 -30953 61905 24.7204 1 6.628e-07 ***
## mod_2_par_5 7 61919 61972 -30953 61905 0.0251 1 0.8741
## mod_2_par_2 10 61944 62018 -30962 61924 0.0000 3 1.0000
## mod_2_par_4 11 61922 62004 -30950 61900 23.7157 1 1.117e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
r.squaredGLMM(mod_2_par_1)
## Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
## R2m R2c
## [1,] 0.0008256517 0.01464871
r.squaredGLMM(mod_2_par_2)
## R2m R2c
## [1,] 0.0008256517 0.01530297
r.squaredGLMM(mod_2_par_3)
## R2m R2c
## [1,] 0.0006726226 0.01701042
r.squaredGLMM(mod_2_par_4)
## R2m R2c
## [1,] 0.0006757839 0.01749113
r.squaredGLMM(mod_2_par_5)
## R2m R2c
## [1,] 0.000672915 0.01738404
The two models with random slope for task throws singular fits. So of the models left, the best one is random intercept for subject and pas.
ii. explain in your own words what your chosen models says about response times between the different tasks
summary(mod_2_par_3)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: rt.obj ~ task + (1 | subject) + (1 | pas)
## Data: data_exp_2_exp
##
## AIC BIC logLik deviance df.resid
## 61917.5 61962.1 -30952.7 61905.5 12522
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.640 -0.153 -0.065 0.046 101.556
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 0.1013 0.3183
## pas (Intercept) 0.0342 0.1849
## Residual 8.1542 2.8555
## Number of obs: 12528, groups: subject, 29; pas, 4
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.08476 0.11852 9.152
## taskquadruplet -0.16106 0.06251 -2.576
## tasksingles -0.15571 0.06283 -2.478
##
## Correlation of Fixed Effects:
## (Intr) tskqdr
## taskqudrplt -0.262
## tasksingles -0.267 0.495
The best performing model has task as fixed effect with random intercept for subject and pas. The beta-values for the fixed effect shows that trials with quadruplets have a 0.161 seconds faster reaction time than trials with pairs. Trials with singles have a 0.155 seconds faster reaction time than trials with pairs. The random effects show that the individual variation was larger between subjects than between different pas categories.
mod_2_par_6 <- lmer(rt.obj ~ task*pas + (1|subject), data = data_exp_2_exp, REML = F)
i. how many types of group intercepts (random effects) can you add without ending up with convergence issues or singular fits?
mod_2_par_7 <- lmer(rt.obj ~ task*pas + (1|subject) + (1|trial) + (1|target.type), data = data_exp_2_exp, REML = F)
mod_2_par_8 <- lmer(rt.obj ~ task*pas + (1|subject) + (1|trial) + (1|target.type) + (1|obj.resp), data = data_exp_2_exp, REML = F)
## boundary (singular) fit: see ?isSingular
Three is possible, four gives convergence issues
ii. create a model by adding random intercepts (without modelling slopes) that results in a singular fit - then use `print(VarCorr(<your.model>), comp='Variance')` to inspect the variance vector - explain why the fit is singular (Hint: read the first paragraph under details in the help for `isSingular`)
print(VarCorr(mod_2_par_8), comp = 'Variance')
## Groups Name Variance
## trial (Intercept) 0.0030073
## subject (Intercept) 0.0995839
## obj.resp (Intercept) 0.0067261
## target.type (Intercept) 0.0000000
## Residual 8.1414480
iii. in your own words - how could you explain why your model would result in a singular fit?
When it come to the last random effect, there is no more variance to explain/model. The data has been divided up into such little chunks, so it can fit the data perfectly, causing it to overfit.
data.count. count should indicate the number of times they categorized their experience as pas 1-4 for each task. I.e. the data frame would have for subject 1: for task:singles, pas1 was used # times, pas2 was used # times, pas3 was used # times and pas4 was used # times. You would then do the same for task:pairs and task:quadruplet## you can start from this if you want to, but you can also make your own from scratch
#data.count <- data.frame(count = numeric(),
# pas = numeric(), ## remember to make this into a factor afterwards
# task = numeric(), ## and this too
# subject = numeric()) ## and this too
data.count <- data_exp_2_exp %>%
group_by(subject, task, pas) %>%
summarise(count = n())
## `summarise()` has grouped output by 'subject', 'task'. You can override using the `.groups` argument.
mod_3_par_1 <- glmer(count ~ task*pas + (pas|subject), data = data.count, family = "poisson")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0117428 (tol = 0.002, component 1)
i. which family should be used?
The family of errors should be poisson since it is count data. I do however not know this distribution and form of regression well, so interpretations of the model will be limited.
ii. why is a slope for _pas_ not really being modelled?
mod_3_par_1
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: count ~ task * pas + (pas | subject)
## Data: data.count
## AIC BIC logLik deviance df.resid
## 2778.988 2861.822 -1367.494 2734.988 297
## Random effects:
## Groups Name Std.Dev. Corr
## subject (Intercept) 0.7484
## pas2 0.8085 -0.73
## pas3 1.0842 -0.91 0.80
## pas4 1.9609 -0.91 0.49 0.80
## Number of obs: 319, groups: subject, 29
## Fixed Effects:
## (Intercept) taskquadruplet tasksingles
## 3.61463 0.06079 -0.23492
## pas2 pas3 pas4
## -0.04106 -0.37127 -0.94328
## taskquadruplet:pas2 tasksingles:pas2 taskquadruplet:pas3
## -0.04416 0.17449 -0.11104
## tasksingles:pas3 taskquadruplet:pas4 tasksingles:pas4
## 0.23755 -0.11778 0.59811
## optimizer (Nelder_Mead) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings
Well, pas is a factor, so we can only model a slope from pas1 to each level, and not one general slope as we would have for a continuous variable.
iii. if you get a convergence error, try another algorithm (the default is the _Nelder_Mead_) - try (_bobyqa_) for which the `dfoptim` package is needed. In `glmer`, you can add the following for the `control` argument: `glmerControl(optimizer="bobyqa")` (if you are interested, also have a look at the function `allFit`)
mod_3_par_2 <- glmer(count ~ task*pas + (pas|subject), data = data.count, family = "poisson", control = glmerControl(optimize = "bobyqa"))
mod_3_par_2
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: count ~ task * pas + (pas | subject)
## Data: data.count
## AIC BIC logLik deviance df.resid
## 2778.987 2861.822 -1367.494 2734.987 297
## Random effects:
## Groups Name Std.Dev. Corr
## subject (Intercept) 0.7502
## pas2 0.8101 -0.73
## pas3 1.0864 -0.91 0.80
## pas4 1.9647 -0.91 0.50 0.80
## Number of obs: 319, groups: subject, 29
## Fixed Effects:
## (Intercept) taskquadruplet tasksingles
## 3.61415 0.06071 -0.23490
## pas2 pas3 pas4
## -0.04097 -0.37076 -0.94171
## taskquadruplet:pas2 tasksingles:pas2 taskquadruplet:pas3
## -0.04411 0.17454 -0.11069
## tasksingles:pas3 taskquadruplet:pas4 tasksingles:pas4
## 0.23754 -0.11779 0.59797
iv. when you have a converging fit - fit a model with only the main effects of _pas_ and _task_. Compare this with the model that also includes the interaction
mod_3_par_3 <- glmer(count ~ task + pas + (pas|subject), data = data.count, family = "poisson", control = glmerControl(optimize = "bobyqa"))
mod_3_par_2;mod_3_par_3
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: count ~ task * pas + (pas | subject)
## Data: data.count
## AIC BIC logLik deviance df.resid
## 2778.987 2861.822 -1367.494 2734.987 297
## Random effects:
## Groups Name Std.Dev. Corr
## subject (Intercept) 0.7502
## pas2 0.8101 -0.73
## pas3 1.0864 -0.91 0.80
## pas4 1.9647 -0.91 0.50 0.80
## Number of obs: 319, groups: subject, 29
## Fixed Effects:
## (Intercept) taskquadruplet tasksingles
## 3.61415 0.06071 -0.23490
## pas2 pas3 pas4
## -0.04097 -0.37076 -0.94171
## taskquadruplet:pas2 tasksingles:pas2 taskquadruplet:pas3
## -0.04411 0.17454 -0.11069
## tasksingles:pas3 taskquadruplet:pas4 tasksingles:pas4
## 0.23754 -0.11779 0.59797
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: count ~ task + pas + (pas | subject)
## Data: data.count
## AIC BIC logLik deviance df.resid
## 2933.264 2993.507 -1450.632 2901.264 303
## Random effects:
## Groups Name Std.Dev. Corr
## subject (Intercept) 0.7500
## pas2 0.8097 -0.73
## pas3 1.0835 -0.91 0.80
## pas4 1.9468 -0.91 0.50 0.80
## Number of obs: 319, groups: subject, 29
## Fixed Effects:
## (Intercept) taskquadruplet tasksingles pas2 pas3
## 3.562402 0.006407 -0.001445 -0.004975 -0.334531
## pas4
## -0.756519
v. indicate which of the two models, you would choose and why
anova(mod_3_par_2, mod_3_par_3)
## Data: data.count
## Models:
## mod_3_par_3: count ~ task + pas + (pas | subject)
## mod_3_par_2: count ~ task * pas + (pas | subject)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## mod_3_par_3 16 2933.3 2993.5 -1450.6 2901.3
## mod_3_par_2 22 2779.0 2861.8 -1367.5 2735.0 166.28 6 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on AIC values we should clearly choose the model with the interaction. However, it does add 6 more parameters thereby making a substantially more complex model. Without looking at the model comparison I find it hard to evaluate whether an interaction effect is justified in theory. I do not know whether I would expect the two variables to interact with each other and change the count in the different pas categories.
vi. based on your chosen model - write a short report on what this says about the distribution of ratings as dependent on _pas_ and _task_
mod_3_par_2
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: count ~ task * pas + (pas | subject)
## Data: data.count
## AIC BIC logLik deviance df.resid
## 2778.987 2861.822 -1367.494 2734.987 297
## Random effects:
## Groups Name Std.Dev. Corr
## subject (Intercept) 0.7502
## pas2 0.8101 -0.73
## pas3 1.0864 -0.91 0.80
## pas4 1.9647 -0.91 0.50 0.80
## Number of obs: 319, groups: subject, 29
## Fixed Effects:
## (Intercept) taskquadruplet tasksingles
## 3.61415 0.06071 -0.23490
## pas2 pas3 pas4
## -0.04097 -0.37076 -0.94171
## taskquadruplet:pas2 tasksingles:pas2 taskquadruplet:pas3
## -0.04411 0.17454 -0.11069
## tasksingles:pas3 taskquadruplet:pas4 tasksingles:pas4
## 0.23754 -0.11779 0.59797
The best performing model has task, pas and their interaction as fixed effects with random intercept for subject and random slope for pas. The model shows that the count for each pas category changes both with pas-category and task. However, it is interesting to notice that the direction of the slope (positive or negative) depends heavily on the interaction between task and pas.
vii. include a plot that shows the estimated amount of ratings for four subjects of your choosing
random_subject <- sample(1:29, 4)
plot_final <- list()
for (i in 1:length(random_subject)){
data.count$fitted <- fitted.values(mod_3_par_2)
plot_temp <- data.count %>%
filter(subject == random_subject[i]) %>%
ggplot(aes(pas, fitted, fill = pas)) +
geom_bar(stat = "identity") +
ggtitle(paste(c('Estimated amount of ratings for Subject', random_subject[i]), collapse=', ' )) + ylim(0,300) +
ylab("Estimated count") +
xlab("Perceptual Awareness Scale Rating")
plot_final[[i]] <- plot_temp
}
plot_final
## [[1]]
##
## [[2]]
## Warning: Removed 1 rows containing missing values (geom_bar).
##
## [[3]]
##
## [[4]]
data_exp_2_exp <- data_exp_2 %>%
filter(trial.type == "experiment")
mod_3_par_4 <- glmer(correct ~ task + (1|subject), data = data_exp_2_exp, family = "binomial")
mod_3_par_4
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: correct ~ task + (1 | subject)
## Data: data_exp_2_exp
## AIC BIC logLik deviance df.resid
## 13636.288 13666.031 -6814.144 13628.288 12524
## Random effects:
## Groups Name Std.Dev.
## subject (Intercept) 0.6018
## Number of obs: 12528, groups: subject, 29
## Fixed Effects:
## (Intercept) taskquadruplet tasksingles
## 1.11896 -0.07496 0.16603
i. does _task_ explain performance?
logit <- function(x) log(x / (1 - x))
inv.logit <- function(x) exp(x) / (1 + exp(x))
inv.logit(1.11896)
## [1] 0.7537958
inv.logit(1.11896 - 0.07496)
## [1] 0.7396211
inv.logit(1.11896 + 0.16603)
## [1] 0.783298
I would not say that task explains performance well. To start out, the three different tasks have approximately the same probability of being correct ranging form 73.96% to 78.33% probability.
ii. add _pas_ as a main effect on top of _task_ - what are the consequences of that?
mod_3_par_5 <- glmer(correct ~ task + pas + (1|subject), data = data_exp_2_exp, family = "binomial")
mod_3_par_5
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: correct ~ task + pas + (1 | subject)
## Data: data_exp_2_exp
## AIC BIC logLik deviance df.resid
## 12260.413 12312.463 -6123.207 12246.413 12521
## Random effects:
## Groups Name Std.Dev.
## subject (Intercept) 0.4886
## Number of obs: 12528, groups: subject, 29
## Fixed Effects:
## (Intercept) taskquadruplet tasksingles pas2 pas3
## 0.14963 -0.02659 -0.03641 0.88736 1.87055
## pas4
## 2.88685
inv.logit(0.14963)
## [1] 0.5373379
inv.logit(0.14963 + 2.88685)
## [1] 0.9541952
By adding PAS as a main effect the weights for task variables has changed, but also a lot more variance have been explained. It is interesting to see that a trial of pairs and a PAS of 1 has a 53.73% chance of being correct, while a trial of pairs and a PAS of 4 has a 95.42% chance of being correct. This makes very good sense, since a PAS of 4 describes a situation where the participant was aware of experiencing/perceiving a target number. It seems like PAS is a very good predictor of correct.
iii. now fit a multilevel model that models _correct_ as dependent on _pas_ with a unique intercept for each _subject_
mod_3_par_6 <- glmer(correct ~ pas + (1|subject), data = data_exp_2_exp, family = "binomial")
mod_3_par_6
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: correct ~ pas + (1 | subject)
## Data: data_exp_2_exp
## AIC BIC logLik deviance df.resid
## 12256.881 12294.060 -6123.441 12246.881 12523
## Random effects:
## Groups Name Std.Dev.
## subject (Intercept) 0.4883
## Number of obs: 12528, groups: subject, 29
## Fixed Effects:
## (Intercept) pas2 pas3 pas4
## 0.1301 0.8864 1.8689 2.8820
iv. finally, fit a model that models the interaction between _task_ and _pas_ and their main effects
mod_3_par_7 <- glmer(correct ~ pas*task + (1|subject), data = data_exp_2_exp, family = "binomial")
mod_3_par_7
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: correct ~ pas * task + (1 | subject)
## Data: data_exp_2_exp
## AIC BIC logLik deviance df.resid
## 12265.772 12362.436 -6119.886 12239.772 12515
## Random effects:
## Groups Name Std.Dev.
## subject (Intercept) 0.4902
## Number of obs: 12528, groups: subject, 29
## Fixed Effects:
## (Intercept) pas2 pas3
## 0.14960 0.91505 1.82975
## pas4 taskquadruplet tasksingles
## 2.83306 0.03068 -0.11076
## pas2:taskquadruplet pas3:taskquadruplet pas4:taskquadruplet
## -0.11915 -0.06631 -0.17373
## pas2:tasksingles pas3:tasksingles pas4:tasksingles
## 0.05846 0.20715 0.28174
v. describe in your words which model is the best in explaining the variance in accuracy
anova(mod_3_par_4, mod_3_par_5, mod_3_par_6, mod_3_par_7)
## Data: data_exp_2_exp
## Models:
## mod_3_par_4: correct ~ task + (1 | subject)
## mod_3_par_6: correct ~ pas + (1 | subject)
## mod_3_par_5: correct ~ task + pas + (1 | subject)
## mod_3_par_7: correct ~ pas * task + (1 | subject)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## mod_3_par_4 4 13636 13666 -6814.1 13628
## mod_3_par_6 5 12257 12294 -6123.4 12247 1381.4067 1 <2e-16 ***
## mod_3_par_5 7 12260 12312 -6123.2 12246 0.4685 2 0.7912
## mod_3_par_7 13 12266 12362 -6119.9 12240 6.6410 6 0.3553
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The best performing model is the one with only PAS as main effect and random intercept for subject. The other models come close to it, but even though they have more parameters, they only increase slightly in explaining the data. Therefore, the AIC is better for the simpler model, and it is also the reason, why I choose this as the model that is the best in explaining the variance in accuracy.